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University of Oslo 

We show that one can perform causal inference in a natural way 
for continuous-time scenarios using tools from stochastic analysis. 
This provides new alternatives to the positivity condition for in- 
verse probability weighting. The probability distribution that would 
govern the frequency of observations in the counterfactual scenario 
can be characterized in terms of a so-called martingale problem. The 
counterfactual and factual probability distributions may be related 
through a likelihood ratio given by a stochastic differential equation. 
We can perform inference for counterfactual scenarios based on the 
original observations, re-weighted according to this likelihood ratio. 
This is possible if the solution of the stochastic differential equa- 
tion is uniformly integrable, a property that can be determined by 
comparing the corresponding factual and counterfactual short-term 
predictions. 

Local independence graphs are directed, possibly cyclic, graphs 
that represent short-term prediction among sufficiently autonomous 
stochastic processes. We show through an example that these graphs 
can be used to identify and provide consistent estimators for coun- 
terfactual parameters in continuous time. This is analogous to how 
Judea Pearl uses graphical information to identify causal effects in 
finite state Bayesian networks. 

1. Introduction. While randomized controlled trials are the gold stan- 
dard for determining the effects of public health interventions or medical 
treatments, there are many situations where such trials are unethical, and 
it is tempting to turn to registry data or observational studies for quality 
assessment of treatments. However, data from such sources is subject to 
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various selection effects from drop-out due to underlying health problems 
to selection of the treatment itself. These problems have motivated the de- 
velopment of the field of causal inference, including in particular the area 
of marginal structural models [24, 25] which have seen applications, for in- 
stance, in HIV cohort studies [28]. The underlying idea is that observational 
data can be used to mimic a relevant hypothetical controlled trial or coun- 
terfactual scenario. 

In this paper, our primary concern is the possibility of estimating parame- 
ters in a model for the observations from a counterfactual scenario involving 
a relevant hypothetical randomized controlled trial. While the specification 
of an appropriate model for the counterfactual observations is an important 
topic in itself, we will focus solely on a situation in which such a counter- 
factual model has been specified correctly. It is common to re-weight the 
observational data in order to mimic observations coming from the counter- 
factual scenario. This is usually referred to as inverse probability weighting. 
Such re-weighting has occasionally been reported to be too unstable, even 
inconsistent, for various purposes; see [7]. It is therefore of great interest to 
understand when this strategy actually works. We will provide some rigorous 
conditions for such re-weighting to be achievable. A similar exposition has 
not been carried out in the literature before, except partly in [25] and [7]. 

A probability distribution on the underlying sample space that would 
govern the frequency of observations in the counterfactual scenario can be 
characterized in terms of a so-called martingale problem. Short-term predic- 
tions provide dynamical characterizations of the various involved modules. 
A hypothetical direct intervention on a module would change its dynamics. 
The nondirectly intervened modules on the other hand, should have the same 
dynamical characterization as in the factual scenario. Martingale problems 
have been thoroughly studied in stochastic analysis; to us one would mean 
that there would exist well-developed tools for determining the feasibility of 
the previous re- weighting methods. An immediate application of these tools 
yields, for instance, that the probability distribution that would govern the 
frequencies of events in the counterfactual situation is unique if it exists; see 
Theorem 4 in the Appendix. 

If the re-weighting is feasible, is it then at all possible to estimate the 
parameters of interest in the counterfactual model from the re-weighted ob- 
servations? In other words, are these parameters identifiable? Pearl's strat- 
egy [21] is to take advantage of graphical structure, in terms of conditional 
independences, for identification of causal effects. It was shown in [12, 27] 
and [10] that this strategy gives a complete theory in the simpler setting of 
finite state or Gaussian-Bayesian networks. For more complicated settings, 
this problem is far from solved. Some results in this direction for time series 
were given in [11]. We show that it is possible to take advantage of local 
independence graphs for identification of causal effects in continuous-time 



COUNTERFACTUAL ANALYSES AND LOCAL INDEPENDENCE 



3 



settings. Note, as this general problem is very hard, we do not provide a 
complete theory for identification of causal effects, only an example which 
slightly extends [19]. 

The idea that the counterfactual situation can be assigned probabilities in 
a way that is consistent with a purely observational scheme, is not new. It has 
also been considered in the general context of marked point processes in [[3, 
4, 8], [20]] and [25]. We choose a martingale-based approach, similar to [25]. 
Note also that graphical models based on local independence and doubly 
stochastic Poisson processes were studied thoroughly in [9]. Continuous- 
time counterfactual interventions were also considered by Lok in [18]. She 
considered structural nested models in continuous time and applied ideas 
from structural equation modeling to survival data. Her strategy differs from 
ours in that we take a purely nonparametric point of view, through change 
of probability measures. 

In Section 2 we describe models for the factual scenario. We then proceed 
in Section 3 with a description of counterfactual variables and distributions. 
In Section 4, we give a sufficient condition for such a counterfactual distri- 
bution to exist, and also a construction based on martingale methods. In 
Section 5, we introduce local independence graphs that play the same role 
as directed acyclic graphs usually do in the literature on causal inference. 
In Section 6, we consider an example where we can identify consistently 
estimate controlled direct effects in event history analysis. Finally, in the 
Appendix, we summarize some properties of dual predictable projections 
and consider uniqueness of counterfactual distributions. 

2. The observational regime and autonomous modules. Eventually, we 
will consider statistical analyses based on observations of several i.i.d. indi- 
viduals, but first we will consider models for one "generic" individual. We 
aim to investigate complex systems for each individual formed by finitely 
many autonomous modules that develop and influence each other through- 
out time. We will not provide a detailed recipe for building appropriate 
models, but simply assume a stochastic model for a generic individual that 
has some specific properties. 

2.1. The underlying probability space and marked point processes. We 
let V denote the finite set of modules that form the system of interest. 
The possible outcomes of these modules are supposed to be realized on 
a probability space Q) with some additional structure that we will 

now describe. Note that we do not assume that the actual frequencies of 
outcomes will be governed by the probability measure Q. This measure will 
only play a role as a "reference measure." The possible "initial" outcomes 
of each module V are given by the outcomes of a corresponding random 
variable Vq. The random variables in this family, which we denote by Vq, are 
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mutually independent with respect to Q. The intital outcome of each V S V 
occurs at a, possibly unknown, time point T(Vq) < 0. The ordering of these 
time points is assumed to be known. We moreover let 

(2.1) p(V ) := {V ' 6 VoTOO < r(vb)}, 

and sometimes refer to this set as the past of Vo- 

The outcomes in the follow-up are driven by a multivariate point process 
JV [13] on a finite time interval [0, T]. Let J denote the mark space of JV. 
This space is supposed to be Lusin, that is, a Borel subset in a compact 
metric space, and equipped with the Borel cr-algebra J . We assume that for 
every module V, there exists a Jy E J such that 

(2.2) V t (oo) = V (Lo)+ [ [ h(oj,s,x)N(uj,ds,dx), 

J J v JO 

where h is a bounded process on [0, T] x J that is predictable with respect 
to the filtration generated by JV| j v and Vo- We also assume that Vo -LLq JV 
and that Iiy g y Jv defines a partition of J such that the restricted point 
processes {N\j v }y&v are mutually independent with respect to Q. 

For each subset W := {V 1 , V d } C V, let denote the filtration that 
is generated by Vq and JV|j y for every V € W and also satisfies the usual 
conditions; see [14]. We let ^ w denote the predictable cr-algebra generated 
by [14]. For notational simplicity, we will also write T% or 2?^ instead 

of Fl V} or as well as T t or 9> instead of 7" f v or '. 

2.2. The factual distribution. The actual frequencies of outcomes in the 
model are not assumed to be governed by Q, but another probability measure 
P such that P < Q and 

(2.3) VoALpT-'TiV^MVojlpiVo) 

for every Vq G Vo, that is, every Vb is independent w.r.t. its simultaneous 
variables, conditionally on the past. We will refer to property (2.3) as con- 
temporaneous independence] see [11]. This is useful to us since it provides at 
least one enumeration {Vq, . . . , V n } = Vo such that T(V l ) > T(Vq) whenever 
i > j and 

(2.4) E P [f(V k )\Vt\ • ■ • , V^] = E P [f(V k )\p(V k )], 

whenever / is a bounded and measurable function and 1 < k < n. 

The processes in V are not necessarily mutually independent with re- 
spect to P, but are still sufficiently autonomous for our purpose. As an 
immediate manifestation of this autonomy, note that the modules may not 
"switch" states simultaneously P-a.s. The reason is that the processes in V 
are associated to disjoint subsets in the mark space J, which cannot occur 
simultaneously. We will refer to P as the factual measure. Note, however, 
as some of the processes in V may be latent, the factual measure P is also 
assumed to govern the frequency of events that may be unobserved. 
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2.3. The factual likelihood ratio and its factorization. The autonomy im- 
poses a factorization of the likelihood ratio ^£ that will prove to be impor- 
tant to us. First note that a repeated use of the Radon-Nikodym theorem 
provides a family {Z^}v£V of nonnegative random variables such that each 
Zq is J-q^ V -measurable and 

(2.5) Eq[ZZ\J*M] = 1 and = JJ ^, Q-a.s. 

There is a similar factorization of g£ . Let [/ denote the dual predictable 

projection of N with respect to Q onto the filtration Tt as in [13]. By 
Lemma A. 2 in the Appendix there exists a nonnegative and 3? ® j7-measur- 
able process A such that 



Ep 



T 



J JO 



h(s, x)N(ds, dx) 



Ep 



T 



J JO 



h(s, x)X(s, x)U(ds, dx) 



for every bounded and 8? ® j7-measurable process h. As common practice, 
we mostly omit uj from equations in order to be notationally less overwhelm- 
ing. 

We now define the processes 

U({t},J v )-f \(t,x)U({t},dx) 

H (t):=1 + i-u({t},j v) 

and 

V ._ I I 1/ _\ TjV I 



(2.6) K t v := / X(s,x)-H v (s)(N(ds,dx)-U(ds,dx)). 

Jj v Jo 

By (A. 3), we see that that {K v }vev defines a family of local Q-martingales 
with respect to the filtration Tt such that 

(2.7) [K v , K v '} = 0, Q-a.s. for V ± V . 



The solution of the SDE 

r-t 



(2.8) Z t = Z +Y^ [ Z a -dK % a 
defines a Q-martingale with respect to the filtration J~t such that 

Zt = .^i , Q-a.s. 
dQ\jr t 

for every t € [0, T]. This follows directly from [13], Theorem 5.1. 

We now obtain directly from Yor's additive formula [23], Theorem II 38, 
that 

(2.9) Z t =]JzY, 

vev 
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where each solves an SDE 
(2.10) ZY:=ZX+ I zY_dK v 



3. Actions and counterfactual distributions. We assume that we may di- 
rectly intervene on a subset of modules A C V such that their outcomes are 
changed. This intervention does not directly affect the outcomes of the mod- 
ules in X := V \ A. The latter set of modules will only be affected indirectly: 
The conditional distributions of their short-term behavior, given the past, 
will remain the same, while the change of previous outcomes yields a change 
of the background these distributions depend on. We will limit our discus- 
sion to actions that are deterministically dependent on the past. These are 
sometimes referred to as conditional actions. Every conditional action will 
be represented by a measurable transformation 9 of the generic state space 
(J), J 7 ). We think of 9{uj) as the direct consequence in the "counterfactual 
universe" where the action 9 was performed. 

Whenever P' is a probability measure on (£l,J~), we let OP' denote the 
push-forward measure over 9, that is, OP'(F) := P'(0~ 1 (F)) for every F 6 T. 
Whenever Pi is an ^-measurable random variable, we let 0*H denote the 
transformed variable, where 0*H(u) := H(0(cj)) for every u E $1. We assume 
that 9 is "continuous" in the sense that the reference measure Q is quasi- 
invariant with respect to 9, that is, 

(3.1) 9Q<£.Q. 



3.1. Actions and counterfactual distributions at baseline. Let V 6 V and 
suppose rj is an J^-measurable random variable, and h is a bounded and 

J-q -measurable random variable. We assume that the outcomes of the not 
directly intervened part of the system are left invariant by the transformation 
at baseline, that is, 

(3.2) 9* V = r) 

for every 77 and every V € X . We furthermore assume that the action depends 
deterministically on the past outcomes in the nonintervened system, that is, 
whenever V 6 A, then 

(3.3) 9*rj is -measurable 
for every r\. 

A probability distribution Pg on defines a counterfactual distribu- 

tion at baseline if, whenever V G A, then 



(3.4) 



E Pe [hr ] ] = E Pg [h9*r 1 ], 
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and, whenever V € X, then 
(3.5) 



E Pe [hr ] ]=E Pe [he*E P [r ] \^ {V) ] 



for every r\. 

Equation (3.5) means that the short-term behavior of a directly inter- 
vened variable is simply given by the transformed variable. Its outcome is 
deterministically regulated by the past. Equation (3.5) means that the con- 
ditional distribution of an outcome of a not directly intervened variable in 
the counterfactual scenario, given its past, coincides with the corresponding 
distribution from the factual scenario. 

Note that Pearl's do(X = x) may also be interpreted as a transformation 
on sample space that fixes X constantly equal to x and leaves the remaining 
variables invariant. This means that our characterization of probability mea- 
sures on (Q, J 7 ) that would govern the frequencies of events in our system 
if we, contrary to the fact, had applied the hypothetical intervention strat- 
egy, is a reformulation of Pearl's do-operator on Bayesian networks [21]. The 
present approach, however, translates more or less directly to continuous- 
time settings. 

3.2. Actions and counterfactual distributions in the follow-up period. When- 
ever Z is a stochastic process on 0, we let 9*Z denote the process given 
by the transformed variables {0* Zt\t&[o,T\- We assume that 9*N defines a 
marked point process that is adapted to the history {J~t}te{o,T}- The ac- 
tion 9 is thought to force the outcomes ^V|[o,t]xJ^ i n t° the outcomes of 
^*-^1[o,T]x Ja> which will only depend on the strictly previous behavior of 
the not directly intervened system, that is, whenever B 6 J4, then 



(3.6) 



l*Nt(B) is predictable w.r.t. Ft 



x 



The outcomes of the not directly intervened part of the system are left 
invariant by the transformation during follow-up, that is, 



(3.7) 



[0,T]xJ x 



N\ 



[0,T]xJ x - 



We will say that Pg defines a counterfactual distribution if it defines a 
counterfactual distribution at baseline, and if whenever X is process on the 
form (2.2) and A is an J^-predictable process of finite variation such that 



Ep 



T 



he dX. 



E P 



T 



h s dA, 



for every bounded and J^-predictable process h, then 



(3.8) 



E P„ 



he dXe 



T 



h,d9*A, 



ifV£X 
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and 




T 



r r 1 

s = Ep g I h s d6*X, 



(3.9) 



/i s dX 



s 



if V € A. 



Jo 



Note that (3.8) means that 9*A defines the compensator of X if V £ X, 
and (3.9) means that 6*X defines the compensator of X , otherwise. This 
offers an analogous interpretation as in the baseline setting. Compensators 
provide a notion of short-term behavior, analogously to the previous con- 
ditional distributions. The short-term behavior of a not directly intervened 
process in the counterfactual scenario, based on the past, coincides with the 
transformed short-term behavior from the factual scenario. The short-term 
behavior of a directly intervened process is given entirely by the transfor- 
mation. 

Following [22], we will say that a model consisting of a factual scenario, 
an action and a corresponding counterfactual distribution, defines a causal 
model if the counterfactual distribution would fit the actual corresponding 
counterfactual scenario. That Pq actually would govern the frequency of 
observations for this hypothetical scenario is generally not testable, and 
mostly comes down to the question of no unmeasured confounding [22]. 

4. Construction of counterfactual distributions. 

4.1. Construction at baseline. We will now construct the counterfactual 
distribution in a situation with no follow-up period. The construction is 
then closely related to Pearl's framework [21]. The next result is important 
and says heuristically that if the conditional probability, given the past, 
of observing outcomes that coincide with counterfactually enforced ones are 
not too small, then there exists a counterfactual distribution. Equation (4.2) 
then offers a useful description of the distribution. Note that this is a measure 
theoretical version of the truncated factorization formula from [21], (3.10). 

Theorem 1. If there exists a nonnegative K £ L 1 (J r o,P) such that 



(4.1) 



dQ\r 



o • 



P-a.s. 



then 



(4.2) 



n zv.6q\ To 



v&x 



defines a counterfactual distribution on J-q that is absolutely continuous with 
respect to P\f and imposes contemporaneously independent outcomes. 
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Proof. First note that for every bounded J-o-measurable random vari- 
able r), 



E P 



d6Q\ 



TT- 



E c 



d6Q\ To 
dQ\ 



To 



Vex 



Ed 



V 



vex 



<E P [ V K]. 



This shows that (4.2) defines a finite measure Pg on Tq such that Pg <C P\t ■ 
We choose an enumeration V\ , . . . , V m of the variables in X such that 
j < k implies that T(Vj) < T(V k ). If V k G X and rj is a bounded J-O^upCV^ 
measurable random variable, then 



(4.3) 



E Q [9* V \ -F^>-> v ^) = o*Eq[t,\ J*"], 



Q-a.s. 



To see this, we let r\\ be an T^ k -measurable and bounded random variable 
and let r\2 be an -measurable and bounded variable and compute 

E Q [6*( mm )\4 V ^ V ^ } ] = E Q [ m \4 V ^> Vk -i } ]8* m 

= 9*(E Q [ m \4 Vl --' Vk - l} ]r l2 ) 



e*(E Q [ m \^ iVk) ] m ) 



e*E Q [ mm \F» 



p(Y k h 



Q-a.s. 



Equation (4.3) now follows from the monotone class lemma. Especially, this 
means that for every k <m, 



(4.4) E Q [e*z^\4 Vl '-' Vk - l} ] 



*E \Z* k \T! 



Vfc|T-P(Vfc)l 



and 



E eQ [zp...zP] = E Q [e*zf 
= E Q [e*z^ 



e*zl k - l E Q [e*z^\T^ 







1, 



Q-a.s. 



V k , Tr {V 1 ,...,V k _i} 



}} 



■9*Z, 



Vk- 



E 6q[Z { 



Vi 



■Z 



That Pg defines a probability measure on J-q follows by induction. 

To see that (3.5) and (3.4) are satisfied, suppose V k € X , and let r/, h be 



bounded random variables such that rj is T^ h -measurable and h is J-q 
measurable. We see that 



p(V h ) 



E Pe [ V h]=Eg 



'k-1 



U Z o)vhzl 



Er 



\j=i 

>k-\ 



l[6*ZQ j \0*hE Q [6*riZ^\Ti 



{Vi,...,Vk-ih 
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'Jfc-1 



e q \^le*z^j9*he*E Q [ v z^ {Vk) ] 

E Pe [M*E P [ V \^ V \ 



If V k € A, then 
Ep f 




9*he* V E Q [Z^\^ vl 



h6*ri 



{Vi,...,V k ^h 



\j=i 

E Pg [he* v }. 



□ 



4.2. Construction for the follow-up period. Condition (3.1) can be made 
somewhat more concrete if the processes, that may be directly intervened on, 
only are allowed to jump at a given finite sequence of predictable times. This 
behavior is very different from that of Poisson processes. More formally, we 
assume that there exists a bounded and J-j-predictable multivariate counting 
measure U A on [0,T] x J a such that 

(4-5) N\[o,T]xJ A <^U A 

for every A£ A. We can now show the reference measure Q is quasi-invariant 
if the probability of an outcome that coincides with the counterfactually 
enforced outcome at short-term is not too small. 

Proposition 1. Suppose that 9 is an T '-measurable transformation on 
O that satisfies (3.2)-(3.6) and assume (4-5). If there exists a bounded and 
& -measurable process Y such that: 

(i) 0Qk«Qta; 

(2) 



(4.6) 



J A JO 



h{s,x)9*N{ds,dx) 



J A JO 



h(s,x)Y(s,x)U A (ds,dx) 



Q-a.s. for every AG A and bounded and -measurable process h; 
(3) there exists a constant c > such that 

(4.7) l-9*N({s},J A )<c-(l-U A ({s},J A )), Q-a.s. 

for every s £ [0, T], 

then 9Q<.Q. 
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Proof. 

The integral equation 

h(s,x)U e (ds,dx) 



J Jo 



J A JO 



h(s,x)9*N(ds,dx)+ 

Vex 



J A JO 



T 



h(s, x)U v (ds, dx) 



defines an Tt- predictable random measure U e on [0, T] x J. 

Let B C J be a measurable subset, and define := f Q f B N(ds,dx). If 
B C J a for an A € A and S is a J-^-adapted stopping time, then 



Eoq[N§ - U e s (B, [0,t])} = Egg [Ng - e*N$] = E eQ [9*N£ - 9*N*] 



0. 



This means that Nt — U®{B, [0,t]) defines a local Q-martingale with respect 
to the filtration Tt- Similarly, if B C Jx , note that 



Ea 



T 



h, dN? 



Ec 



Eq 



*h s dN s 



B 



E 



0Q 



9*h s dU(ds,B) 



h s dU d (ds,B) 



for every bounded and ^-predictable process h. Now, N([0,t],B) — U ([0,i], 
B) defines a local #Q-martingale with respect to the filtration {J T t}t&[o,T]- 
This means that 



Ea 



J Jo 



h(s, x)N(ds, dx) 



Ea 



J Jo 



h(s,x)U e (ds,dx) 



for every bounded and 8? <8> J- measurable process h. 
We define the processes 

rAns. Ou^ 1 U({t},J A )-9*N({t},J A ) 



H A (t,x) :=Y(t,x)-l 



l-U({t},J A ) 



I(U({t},J A )^l), 



J A JO 



H A (s,x)(N(ds,dx)-U(ds,dx)), 



and let C := T>AeA^ A - 

By [14], Proposition I 3.13, there exists a ^-measurable and nonnegative 
stochastic process j A such that j A < 1 and 



J a Jo 



T 



h(s,x)U A (ds,dx) 



J a Jo 



h(s, x)j A (s, x)U A (ds, dx) 



Q-a.s. for every bounded and ^-measurable stochastic process h. 
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A computation shows that the predictable variation process for £ with 
respect to Q satisfies 



(c,o* = E<c A .c A >* 



AeA 



= E / / ^(s,x)V(s,x)(l-7 A (^x))^ A (^,dx), 

^g^JJA JO 

which is Q-a.s. uniformly bounded. Now, [17], Theorem II. 1, implies that 
the SDE 

(4-8) * = ^+ 

dQ\F Jo 

defines a uniformly integrable Q-martingale with respect to the filtration 
Tt- This means that 

Q ■= PT ■ Q 

defines a probability measure on (Q, J 7 ). 

A computation shows that if B C Jy for some V 6 V, then 

(4.9) ivf -l7 t ([0,t],B)- [* pjld{N B -U B ,p) s = N t B -U e ([0,t],B). 

Jo 

Girsanov's Theorem [14], Theorem III 1.21, implies that 



T 



h(s, x)N(ds, dx) 



J Jo 



T 



h(s,x)U e (ds,dx) 



J Jo 



for every bounded and & ® t 7-measurable process h. Finally, [13], Theo- 
rem 3.4, implies that there exists only one probability measure which has 
U e as a dual predictable projection for N. Therefore 9Q = Q <S Q. □ 

The next result is important and says that if the probability of observing 
an outcome that coincides with the counterfactually enforced outcome at 
short-term is not too small, then there exists a counterfactual distribution for 
the follow-up period. The counterfactual distribution can then be obtained 
by re- weighting the factual distribution, that is, Pg <S P. Note that (4.12) 
provides a continuous-time analogy of the truncated factorization formula 
for Bayesian networks [21], (3.10). 

Theorem 2. Suppose that the conditions of Theorem 1 are satisfied and 
that there exists a bounded and -measurable process Y such that: 

(1) 

(4.10) / / h(s,x)6*N(ds,dx)= [ [ h(s,x)Y(s,x)X(s,x)U(ds,dx) 



ij a Jo J. J A Jo 
P-a.s. for every A £ A and bounded and ^-measurable process h; 
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(2) there exists a constant c> such that 

(4.11) l-9*N{{s},J A )<c(l-X-U({s},J A )), P-a.s. 

for every s G [0, T] . 

Then there exists a counter factual distribution Pq such that Pq <C P. We 
also have that Pg <C 0Q and 

(4.12) X t := H ZY, 

vex 

where Z v is the process defined in (2.10), defines a 6 Q -martingale with 
respect to the filtration {J~t} that satisfies the SDE 

(4.13) X t = J] ZY + £ [ X s- dK Y 

vex vex Jo 

and 

Proof. We follow the proof of Proposition 1 and define the processes 
G (t,x) := Y{t,x) - 1 — — — —r — I(X ■U({t\,J A ) / 1), 

g*:= f ( G A {s,x){N(ds,dx)- X-U(ds,dx)), 
Jj A Jo 

e=-£e*. 

AeA 

By [14], Proposition I 3.13, there exists a ^-measurable and nonnegative 
stochastic process j A such that j A < 1 and 

h(s , x) X(s , x)U (ds , dx) = / / h(s,x) / y A (s,x)U A (ds,dx) 
J A Jo Jja Jo 

Q-&.S. for every bounded and immeasurable stochastic process h. 

A computation shows that the predictable variation process for £ with 
respect to P satisfies 

AeA 

A s- A J J A JO 



AeA 
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which is Q-a.s uniformly bounded. Now, [17], Theorem II. 1, implies that the 
SDE 

(4.15) w t = ^b+ fw.-dtU 

defines a uniformly integrable P-martingale with respect to the filtration 
Tt- This means that 

P e :=Z T -P 

defines a probability measure on (Q, T). 
The integral equation 



(4.16) 



(4.17) 



J Jo 



h{s,x)v (ds,dx) 

T 



Jx JO 



h(s, x)\(s, x)U (ds, dx) + 



J a Jo 



h(s,x)6*N(ds,dx) 



defines a predictable and nonnegative random measure v® on [0,T] x J such 
that 

x /(A • U({s}, J) + 1) ) N(ds, dx) - A • U(ds, dx). 



We obtain from [13], Theorem 5.2, that 
h(s, x)N(ds, dx) 



E P„ 



J Jo 



h(s,x)u e (ds, dx) 



J Jo 



for every bounded and & <g> j7-measurable process h; that is, Pe defines a 
counterfactual distribution. 
We may compute that 



AC 



J A 



Y(s,x)N({s},dx)-d*N({s},J A ) 



+ (U({s}, J a) ~ 0*N({s}, J A )I(U({s}, J a) * I)) 
x(U({ s },J A )-N({s},J A )) 



and that 



Ja 



Y(s,x)N({s},dx)-e*N({s},J A ) 



+ (U({s}, J A ) - 9*N({s}, J A )I(X ■ U({s}, J A ) + 1)) 
x (U({s},J A )-N({s},J A )). 
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We moreover define a process x as follows: 

A6 + 1 



One can show that % only jumps at the jump times of U and that A% is 
uniformly bounded. This means that the SDE 

(4.18) ^^n^+A-**- 

defines a P semi-martingale with respect to the filtration T%. Note that 
A£ s = — 1 implies that A£ s = — 1, so 

(4.19) C + [C)X ] + X = e , p_ a . s . 

Yor's additive formula [23], Theorem II 38, then implies that 

(4.20) 7T tPt = + [\ s _p s _c% s . 

a ^\r Jo 
This implies that W = ptr, and hence 

E Pg [h] = E P [hW T ] = EQ[hZ T p T ir T ] = Eg Q [hZ T TT T ] 

for every bounded and J-^-measurable random variable h, so Pg <C 0Q. Fi- 
nally [13], Theorem 5.1, shows that the likelihood ratio is given by the 

SDE (4.13), and hence Yor's additive formula provides identity (4.12). □ 

Note that since Pg <S 9Q = 9 2 Q, the counterfactual distribution Pg is 
actually invariant with respect to the action 9, that is, 

(4.21) 9Pg = Pg. 

5. Local independence. 

5.1. Identifiability and short-term dependence. A causal effect is iden- 
tifiable if it can be uniquely obtained from the factual distribution of the 
observable variables. This is generally very hard to determine and may also 
require further parametric assumptions. We show that it is possible to take 
advantage of graphical structure, in terms of local independence graphs, to 
do this. Such graphs are useful when deciding in which situations causal 
effects are identifiable, and also which factors we might adjust for. 

We will say that V £ V is locally independent of a subset B C V at baseline, 
conditionally on V' C V, if the conditional density of Vb, given the past, does 
not depend on the baseline information from B. More precisely, for every 
integrable and J r Q / -measurable random variable n, there exists a random 

variable fj that is Jq ^ n ^ v -measurable and such that if h is Jq v ^ nV - 
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measurable, then 

(5.1) Ep[ v h]=E P [fjh}. 

A process V € V is locally independent of B C V during follow-up, con- 
ditionally on V', if for every process X on the form (2.2), there exists an 



{V}UV'\B 



J 7 / 



(5.2) 



predictable process A with finite variation such that 



Ep 



T 



h„ dXc 



Ep 



h„ dAc 



for every bounded and ^ v ^ uV -predictable process h. If V is locally inde- 
pendent of B, conditionally on V', both at baseline and during follow-up, we 
will say that V is locally independent of B, conditionally on V' . This will 
sometimes be written B ->*» V\V' . A local independence graph is a directed 
graph G = (V ,£) for V' C V such that the absence of an arrow from a subset 
B C V to a process V £ V' means that B ^V\V' . Note that local indepen- 
dence graphs are also refered to as local independence graphs (see [1, 9]) 
and were introduced in [26]. 

Given time points {T(Vo)}vev at baseline and a local independence graph 
G = (V,£), we can pick a linear ordering of Vo that satisfies (2.4) and there- 
fore yields 

(5.3) V* ALp {V \V 2 , . . . , Vt 1 }^^ 

for every i < n. Property (5.3) is known as the ordered directed Markov 
Property and was shown to be equivalent to the local directed Markov prop- 
erty in [16], Theorem 2.11. This means that Bayesian networks and local 
independence graphs are two descriptions of the same structure when the 
nodes correspond to single variables. Note that local independence graphs, 
where the nodes are allowed to be families of variables or processes, are 
allowed to be cyclic. 

5.2. Measur ability of intensities. Local independence during the follow- 
up is closely related to the measurability of intensities. 

Lemma 1 . Suppose that V is locally independent of B at baseline, con- 
ditionally on V", then B -» V\ V' if and only if there exists a nonnegative and 
<zp>{V}uv \B - measura })l e process such that 



(5.4) E P 



j v Jo 



T 



h(s, x)N(ds, dx) 



= E P 


\S I 




J J v Jo 



h(s, x)X v (s, x)U (ds, dx) 



for every bounded and <^>i v } uV ' -measurable process h. 

Proof. If there exists a process as in (5.4), then B V\V' follows 
directly. Conversely, suppose that B V\V' and let D C Jy be a measurable 
subset. Now, Nf := N([0,t],D) defines a processes on the form (2.2), so 
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there must exist a corresponding predictable increasing process A D of finite 
variation such that 



Ep 



h.dN. 



D 



E P 



h.dk 



D 



for every bounded and jr^ v ^ uV -predictable process h. 

The Radon-Nikodym theorem now provides an .P^^^'^-measurable 
and nonnegative process such that 

rT 



(5.5) 



E P 









F h,dN? 


= Ep 


11 


Jo 







h s \i D) U(ds,D) 



for every bounded and ^ v ^ uV -measurable process h. 

Since J is a Lusin space, we may construct a nonnegative and £p{ v } uV '- 
measurable process A^ that satisfies (5.4) as a limit of processes that are 
finite linear combinations of processes on the form / • Jp>, where D is a 

measurable subset in Jy, and / is a bounded J r / V '^ uV ^-measurable process. 

□ 

5.3. Markovian factorization property. The local Markov property im- 
plies the Markovian factorization property; see [21], (1.33) and [16], (2.10). 
We will now see that a local independence graph yields a similar factor- 
ization for the follow-up period. We use the following notation from graph 
theory: whenever V E V, let cl(V) C V denote the set formed by V and its 
parents in G. 

Theorem 3. If G= (V, £) is a local independence graph with respect to 
P, then there exists an -adapted P -indistinguishable version of each 

process Z v from Theorem 2.9 where 

Z=Y[Z V , P-a.s. 

vev 



Proof. Let T) 



pa(V) 



let 



Now 
(5.6) 



o 



W'GpafV) ^0 



V'Gpa,(V) 

dP 



and ^ :— Vy'ecl(V) ^0 



and 



Y 



v 



pa(V) 



dQ\ 



pa(V) 



Q\ 



d(V) , 



so there exists, by the Radon-Nikodym theorem, an J^'^-measurable ran- 
dom variable Zq such that 

(5.7) P\^ V) =Z^Y V -Q\. 
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We then have, for every bounded and measurable function h, that 

^p[fc(v )[^] = ^p[W)|jr w ]=^[W)^ v l^ (v) ] 

The contemporaneous independence at baseline and a simple monotone class 
argument shows that 

(5.8) Ep[r ] ] = E z \ V l[zV 

- vgv 

for every bounded and J-cr measura ble random variable r]. 

For the follow-up, note that by Lemma 1 there exists a nonnegative and 
& cl ( v ) -measurable process A^ such that 



E P 



J v JO 



h(s, x)N(ds, dx) 



= Ep 


\S I 




Jjy JO 



h(s, x)X v (s, x)U(ds, dx) 



for every bounded and ^-measurable process h. 

We may now form K v , Z v and Z as in Theorem 2.9 using X v instead 
of A. Following the short argument in [6], Theorem II T12, we see that any 
other choice of a nonnegative and ^-measurable process A that satisfies the 
previous equation would necessarily give 

(5.9) f [ I(X(s,x)^ X v (s,x))N(ds,dx) = 0, P-a.s. 

J J v Jo 

This means that the corresponding versions of the process K v from (2.6) 
would be P indistinguishable. Furthermore, this also means that the version 
corresponding to X v provides an J-j ^ -adapted solution of the SDE (2.10) 
which is P-indistinguishable version from Z v . □ 



6. An example: Controlled direct effects. We now illustrate how local 
independence graphs can be used to identify causal effects by an example 
with cancer patients. Suppose each patient is offered one of two different 
surgical treatments, a\ or 02- The patient is subject to an examination after 
surgery where some measurements are taken. These measurements might de- 
pend on the chosen surgical procedure and some underlying health condition 
that is not directly observed. After the surgery, the patient is given further 
treatment in order to prevent relapse. The chosen post surgery treatment 
strategy might depend on the surgical procedure and the measurements. 

We consider a generic model for the patients in this scenario. The relevant 
outcomes are provided by the family of random variables V = {W, A, L, K, B}. 
As in Section 2, we consider a probability measure Q such that these vari- 
ables are independent and a probability measure P that governs the fre- 
quency of outcomes in the factual scenario and such that P <Q. Let the 
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random variable A denote the choice of surgery, let W denote the latent 
health condition, let L take the value of the measurements after surgery, let 
K denote the post surgery treatment strategy and let B denote the status of 
relapse. We furthermore assume that T(W) < T(A) < T(L) < T(K) < T(B) 
and that the following local independencies are satisfied: 




How much of the treatment effect is due to the choice of surgical procedure 
alone, that is, not due to the choice of post surgery treatment? Pearl [21], 
Section 4.5.3, showed that it is possible to identify the controlled direct effect 
from surgery on the risk of relapse, even without any observations of W. We 
rephrase his argument slightly: 

Proposition 2. If 6*K is Tq -measurable, 6* A is constant, L, W and 
B are 9-invariant, there exists a constant c > such that 

(6.1) P(A = 6*A)>0 and P(K = 9*K\A = 6* A, L) > c, P-a.s 

and h is a bounded and measurable function, then there exists a unique 
counter factual distribution Pg such that Pg <C P and 



(6.2) E Pg [h{B)\ = e*E P [9*E P \h(B)\F Q 

- ( T A TS Dl 

Let 



{L,A,Kh 



Pa-a.s. 



■p{L,A,K,B} suppose that Z B is a nonnegative and J-q 



measurable random variable and Z is a nonnegative Jq -measurable random 
variable such that 

E P [h(B)\4 A ' L ' K} ] 



yB\ -r-{A,L,Kh 



E Q [h(B)Z B \T 
E P [h(L)\^] = E Q [h(L)Z L \^} 

P-a.s. Now, 

(6.3) E Pg [H} = E eQ [HZ L Z B } 

for every To -measurable random variable H, that is, 

dPe\f 



(6.4) 



d9Q\ 



Z L Z B . 



To 



Proof. Note that (6.1) means that (4.1) is satisfied, that is, we obtain 
a counterfactual distribution Pg from Theorem 1. 
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Whenever h\,h 2 are bounded and measurable functions, then 
E Pe [h 1 (B)h 2 (L)] = Ep[W h 1 (B)h 2 (L)} 

= E P [W E P [h(B)\T A ^ L ]h(L)] 

= E Pg [E P [h(B)\T A ' K ' L ]h(L)] 

= E Pg [9*E P [h{B)\F A ^ L }h{L)] by (4.21). 

This shows that E Pg [h 1 {B)\T^}= 9*E P [h l {B)\Fl AL ' K} ] P e -a.s. Moreover, 
note that 

E Pg [h 2 (L)] = E Pg [9*E P [h 2 (L)\T A ' W ]] 
= E Pg [9*E P [h 2 (L)\T A }] 
= 9*E P [h 2 (L)\T A ], Pe-a.s. 
Combining these computations, we obtain 

Ep g [h(B)}=E Pg [Ep e [h(B)\J r o}} 

= E Pe [9*Ep[h(B)\F^ K ]] 

= 9*Ep[9*E P {h{B)\F^ A ' K ]\A] 

Pg-a.s. for every bounded and measurable function h. 

To see that equation (6.3) is satisfied, note that by the monotone class 
lemma, 

E Pg [H] = 9*Ep[9*Ep[H\4 A ' L ' K} ]\rf] 

= 9*E Q [9*E Q [HZ b \T^ A ' L ' K} ] Z L \F£} 
= E 9Q [E Q [HZ B \4 A ' L ' K} ]Z L ] 

= E eQ [HZ B Z L ). □ 

If we consider actions 9\ and 9 2 such that 9* A = en and 9\K = 9 2 K, Q-a.s. 
then the relative direct risk of relapse is given by 

P dl (B = l) _ Ep[9* 1 E P [h(B)\4 A ' L ' K} ]\A = a 1 ] 



Pe 2 (B = 1) E P [9*Ep[h(B)\4 AL > K} }\A = a 2 ] ' 

6.1. Incomplete observations and time dependent treatments. We have 
not yet taken into account that the patient observations could be censored 
during the follow-up period. There might be several reasons for such cen- 
soring. This might be due to the end of study period, drop-out due to the 
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underlying health or because of other reasons. The risk of having an ob- 
served relapse will typically be smaller than the risk of having a relapse. We 
will work in the framework of event history analysis in order to provide a 
reasonable effect measure subject to such incomplete observations. This will 
also allow us to consider time dependent post surgery strategies K. 

6.1.1. A dynamic model. We proceed with the previous setup, but where 
B and K are represented by processes and every patient may be censored 
during the follow-up period. The factors A, L and W are as in the previous 
example. B is represented by a counting process that jumps from to 1 at 
the time of the event. The censoring of the individual is represented by a 
counting process C that jumps from to 1 at the time of censoring. 

We suppose that the baseline treatment A may be of two different types; 
hence A takes value in {0, 1}. Moreover, we suppose that additional post- 
surgery treatment is given to the patient at the jumps of the counting process 
K. This treatment may be given recursively, but only at a series of Tf 
predictable times; that is, (4.5) must be satisfied. We furthermore suppose 
that 6*K S is constant for every s P-a.s. and suppose that Bq = 0, Kq = 
and Co = P-a.s. 

Let T\,...,T n denote the potential post-treatment times, and let := 
Yli — t)- The counting process U K is predictable and vf = £ P(AK S / 
0\T s -)dUf. By Theorem 2, we see that there exists a counterfactual distri- 
bution if P(A = 6* A) > and there exist c\,C2 > such that 

(6.6) 1 - Cl P(AK s = 0| F a J) < A9*K S < c 2 P(AK s ^ 0| P s _) 

for every s P-a.s. 

We suppose that the following local independence graph is satisfied with 
respect to the factual distribution P: 



A" L W 



C 




Especially, this means that the short-term behavior of the censoring may 
not depend on other variables than A. 

6.1.2. Restriction to Aalen's additive hazard model. If we assume that 
the event process satisfies Aalens's additive hazard model [2], it is actually 
possible to identify, and also consistently estimate the direct effect from 
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surgery. Every outcome after the time of censoring is supposed to be un- 
observed. In addition, we assume that we are not able to observe the vari- 
able W. 

We consider the censored process 

Bf.= B + f (l-C s .)dB s 
Jo 

and let J~t denote the filtration that is generated by A,K,L,C and B. Fur- 
thermore let Yt denote the factual "at-risk" process, that is, Yj = I(Bt- = 
Ct- = 0). We assume that there exist measurable and bounded functions 
ip K , tp L and ip A such that 



(6.7) E P 



he dB* 



E P 



h a Y a (iP» + Atf + Lift + Ks-if>?) ds 



for every bounded and ^-predictable process h. 

We are now able to identify the controlled direct effect from surgery. Note 
that this is just a slight variation of the model considered in [19] . 



Lemma 2. If cr 1 and a 2 are two V Tf -"predictable processes such that 



E P 



L I h t Y t exp 

T 







K_ s ^f ds ) dt 



Ep 



o 

-A,B,C 



h t Y t exp I / A'_ s Vf ds I dt 







- i-T 


dt 


= Ep 


/ h t o-\ dt 






Jo 






- rT 


dt 


= E P 


/ h t af dt 






Jo * . 



for every F Q ' ' -predictable and bounded process h, then 



Ep 



(6.8) 



T 



9tYdB t 



Ep b 



£ gt Y t + tfe*^ + e*A^t + e*Kt-tf^ dt 



for every J 7 ^' -predictable and bounded process g. 

Sketch of proof. By Theorem 1, there exist an J^-measurable ran- 
dom variable Wg and an J^'^'^-measurable random variable Wq such that 



dP e 



dP\r 



dPg\ t l,A 

±°- = J^ 1 Wl and — = Wl . 



dP 



o ■ 



If Hi is J-^-measurable, Hi := Ep[Hi \J- A ] and H2 is J-Q^-measurable, then 
(6.9) E Pe [HiH 2 ] = E P [HiH 2 W l ] = E P [HiH 2 W^} = E Pe [HiH 2 ]. 
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Similarly, let h be a bounded and JVpredictable process, and let /if := 

Y s (^ + Aipf + Lips + K s -ij)f), and note that 



e Pr 



h. dB q 



Ep 
Ep 
Ep 
Ep 
Ep 
Ep, 



T 



h s dB s W T 



T 



h s W s - dB s 

[ h s W s -dB s 
Jo 

[ h s W s -vfds 
Jo 

T 



+ E P 



h s d[B,W], 



h s nf dsW T 



h s /j, s ds 



by [14], Proposition I 3.14 



One can show that there exists an intermediate probability measure P on 
Tt such that: 

(1) 

P e y T «P«Py T . 

(2) For every bounded and Borel-measurable function h: 

• E p [h(A)]=h(9*A), P-a.s.; 

. Ep[h(L)\tf,]=E P [h(L)\tf]; 

. Ep [h(K ) \rf' L ] =Ep[h(K )\tf> L ]; 

. Ep[h{B Q )\Tt' L ' K ]=E P [h{B Q )\T^ L ' K }. 

(3) Whenever h is a bounded and JVpredictable process, then: 



Ep 







- pT 


/ h s dB s 


= Ep 


/ h s fif ds 


Jo 




Jo 



if fi K and /i G are J^-predictable processes such that 











Ep 


/ h s dK s 


= Ep 


C h sl xf dUf 




Jo 




Jo 










Ep 


/ h s dC s 


= Ep 






Jo 




Jo 
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then 



Ei 



tie (IK* 



r rT 



Ei 



Ei 



h^dXjf 



T 



Note that by [13], Proposition 4.3, there exists an J- t 
gale H such that 



A.L.B 



adapted P-martin- 



dP e 



and 
(6.10) 



ST 



dP\j-A,L,B,C 



[B,E} = 



Y exp 



TKr^dr). 



o 



Bayes's formula with predictable projections shows that 
rT 



(6.11) 



E, 



L 



YMe ds 



o 



Ei 



T a l ■ 

Y s h s -^ds 
cri 



ABC 

for every bounded and T t ' ' -predictable process h. Now, 



Ep„ 



T 



LheYe ds 



Ei 



Ei 



E f 



E p 



EPa 



Lh s Y s dsr^T 



Ho_L/i,Y, ds 



a 1 

^s—hsYs k ds 
a 1 

h s Y s ^dsE T 
at 



t a i ■ 

h s Y s -^ds 
cri 



by (6.10) 



for every bounded T t 
holds. □ 



A,B,C 



predictable process h, which implies that (6.9) 



6.1.3. Consistency of the modified sequential G- estimator. We are now 
able to show that the modified sequential G-estimator suggested in [19] is 
uniformly consistent, also when we consider a time-dependent mediating 
treatment K. Let #i,#2 be two actions as in the previous^roposition, but 
where 6\A = and Q\A = 1 and consider corresponding ~F t ' ' -predictable 
processes 7 1 and j 2 as the fractions in (6.11). Furthermore, we assume that 
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our observations consist of the event histories for n independent equally 
distributed individuals, following the current generic model. We will also 
slightly misuse the notation and let N, from now on, denote the correspond- 
ing counting process that is aggregated over the n independent individuals. 

Lemma 3. Let ^>° l ^! A ^ L and ^> K denote the usual additive regression 
estimators of Aalen, let Y := Y t B Y t c and define 

rt 



( 



M t := N{ 



It 



It 



2 ' 



r 



t ■- 



1 JiTfi 



\ 



H t := diag 



Y^exp 



' Yt ex P 



H f := dias 



Ffexp 
/I A x 



Z t := Y t 



Zf- '■— (.Zj-Hg-Zs-) 1 Z];_H S -, 



(Zj_H s -Z s -) 1 Z^_H S 



and 



Tt 



Z?_ dN< 



We have that 
(6.12) 

for every 5 > 0. 



z*Lk,-&k. 



lim P[ sup \T t — T t \ > 5 

n— >ca \t<x 



Proof. Note that 



(6.13) 
(6.14) 



Z? dN* 



z?_K a -d*?-r t 



7ll 



Z? dN s + / (Zt - Z?_)K 8 - d* 



H 



7 H 



,K 
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(6.16) 
Let 
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ft _ 



Z?_dM a + I Zf_K s -d{V 







s s ) 



+ f Zf_{Z s _ L s -)d( *A 



V 



1 

-1 1 



We have that V Zj_H s _Z s _V = St- where St- is a 2 x 2-diagonal matrix. 
Moreover, {Z^R S -Z S -Y X = VS t -V T . 

Note that | f Z^_ dM s ^ is Lenglart dominated by Tr(J" dM s ) and 



Tr 



Z? dNL 



ds 



rT 

= / TiiZj^Hs-Z^Z^Hs-dmgfiHs-Zs-iZ^Hs-Z^y 1 
Jo 

rT 

< / Tr(Z^H s -Z s -y 1 \\diagiJ 1 H s -\\ op ds, 
Jo 

which converges in probability to 0. By Lenglart's inequality [14], we obtain 
that J Q Zf_dM s converges uniformly to in probability with respect to P. 





Since 



S— >c 



lim P[ sup \Z S K S \ > 8 ) = and lim P( sup \Z S K S \ > 8 ) = 



S— >c 



and ^ A converges uniformly in probability to ^ K , we also have that 



/' 

Jo 



Z*_K s -d{^* - 



and 



(Z?_-Z?_)K s _d® 



K 



converge uniformly in probability to w.r.t. P. This shows that (6.15) con- 
verges uniformly in probability to w.r.t. P as well. 
We have that 



Zf_L s . 



vs: 



i=l 



V 



v 



i=l 



J 



The law of large numbers implies that Z^_L S - converges in P-probability 
to V^y(s). Now, (6.16) equals 

(6.17) 



/ Zf_L,--V7{8)d9l 
Jo 
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and 



E P 



.18) 



sup 

t 



< 



Z H L S 



Ep[\(Z?-Ls 



ds. 



Therefore (6.16) converges uniformly in probability w.r.t. P. 

A computation shows that | f Z£L — Z^_ dN s 1 1 is Lenglart dominated by 



V 



+ 



ds. 



This process converges uniformly in probability to 0, so we see that (6.14) 
also converges uniformly in probability to 0. This means that T — T converges 
uniformly in probability to 0, so T actually converges to T in the similar 
sense. □ 

The cumulative Pg i -hazard of B is given by 



(6.19) 



A 



V s u + 0*A^ + 9*K S ^ + la tf ds. 



Since stochastic integrals are continuous with respect to uniform conver- 
gence on compacts in probability, we see that 
r-t 



lim P sup 

<5^o V t 



(1, 



dT s 



A 



>5 



0, 



that is, we obtain a consistent estimator of A t *. A consistent estimator for 
the controlled direct effect of A on B is given by the second component of T. 

7. Discussion. The primary concern in this paper is the possibility of es- 
timating parameters for the counterfactual situation from the observational 
data, given that the counterfactual model is correct. This comes mainly down 
to whether the counterfactual probability is absolutely continuous with re- 
spect to the factual probability and whether the counterfactual parameters 
of interest are identifiable. The previously mentioned related works by Arjas 
and Parner, [3] and [4], construct counterfactual probability distributions by 
piecing stochastic intervals together as in [13], Section 3. Unlike Parner and 
Arjas, we take a more martingale oriented approach, also based on the sem- 
inal paper [13]. This enables us to apply directly already well-established 
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methods from stochastic analysis and martingale theory. In fact, surpris- 
ingly much causal inference can be well understood in terms of martingale 
measures, Bayes's rule and Girsanov's theorem. This approach translates di- 
rectly the problem about data re-weighting into a thoroughly studied prob- 
lem in the literature, that is, whether the stochastic exponential of a local 
martingale defines a martingale, see [17] and [15]. 

Another difference from the work of Arjas and Parner is that we consider 
an explicit intervention in terms of a transformation on sample space. 
While not being absolutely necessary, it still provides additional clarification, 
as it makes the notion of counter factual outcomes more explicit, or perhaps 
even demystified. The notation do(X = x), [21], is simply interpreted as the 
measurable transformation on the sample space that forces every outcome 
of X into x and leaves the remaining observations unchanged. When the 
action becomes more complex than just forcing a variable into a fixed value, 
this interpretation becomes even more appealing. 

The introduction of the transformation 8 sheds some light on another 
aspect: One may in fact think of a causal inference problem as a stochastic 
control problem, or a decision problem, where the assumptions about the 
model are kept as modest as possible. The main objective in stochastic 
control theory is to find an optimal intervention strategy and compute the 
corresponding expected payoff. Causal inference appears as a special case 
of this, in the sense that there one mostly considers only one intervention 
strategy, namely the transformation 9, and aims to compute the expected 
payoff. 

One is often confronted with latent factors in epidemiological settings. 
This lack of information typically yields nonidentifiable effects. In special 
situations, one can use graphical arguments to ensure identifiability of coun- 
terfactual parameters and also provide exact formulas for these. Such exam- 
ples are the back-door formula, front-door formula and sequential back-door 
formula [21], Section 3.3.1, 3.3.2, 4.4.3 and [11]. We show that we may 
take advantage of the local independence graphs to identify causal effects in 
event-history analysis. 

When the counterfactual effect is possibly unidentifiable, one may try to 
compute upper and lower bounds for this. This can also be thought of as a 
control problem where "the nature" is allowed to control the latent factors 
in order to maximize or minimize counterfactual effects. This corresponds to 
an optimization problem under constraints. The latent variables may only 
be altered in such a way that the observable factors maintain the same 
joint distribution and also such that some given directed graph constantly 
defines a local independence graph. Let S denote the set of counterfactual 
distributions corresponding to these constraints. The "causal effect" would 
then be sandwiched by inf p> e g Epi[n] < Ep g [n] < sup P „ £S Epn[n]. 
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The set S may have a somewhat complicated geometry. If one instead 
considers the convex hull, we obtain other, not necessarily, tight bounds. 

inf Epi [rj\ < Ep g [i]] < sup Epn [rj\ . 

P'Gconv(S) P"Gconv(5) 

These bounds may be computed by allready developed linear programing 
techniques. This approach was for instance taken in [5], but is likely to 
generalize to more complicated continuous-time scenarios as well. 

APPENDIX 
Uniqueness of counterfactual distributions. 

Lemma A.l. There exists at most one counterfactual distribution Pg on 
J-q that imposes contemporaneously independent outcomes. 

Proof. Let T\, . . . ,T m be an enumeration of {T(V)}vev such that j < k 
implies Tj < . 

Assume that P' and P" are two counterfactual distributions that have 
contemporaneously independent outcomes and n is an J 7 ^ -measurable ran- 
dom variable. Let {Aj}j be an enumeration of {V € A'IT(V) = T\} and let 
{Aj}j be an enumeration of {V £ A\T(V) = T\}. Whenever {hi}i and {gj}j 
are two families of bounded and measurable functions, then 



E, 



JhiiX^Hg^Aj 



E f 



Jhi(Xi) E P , J[gj{Aj 



l[Ep,[hi(Xi)]Ep, Hgj(Aj) 

i L j 

HEp^hiiX^Ep, HgjiAj) 



Epn 



Jhi(Xi) E PII HgjiAj) 



E r 



HhiiX^UgjiAj) 



This shows that if n is a bounded random variable that only depends on 
the information at T\, then Epi[if\ = Epn[rj\. We continue with an induction 
argument and assume that Epi[rj\ = Epn[rj\ for every bounded and random 
variable n that only depends on {V G V|T(y) < T^} and aim to prove that 
this also holds if ij depends on the information at time T^. Let {X{\i be 



30 



K. R0YSLAND 



an enumeration of {V 6 A'IT(V) = and let be an enumeration of 

{V £ A|T(y) = Tfc}. Whenever {hi}i and {gj}j are two families of bounded 
and measurable functions, then 



Epi 



TjEj 



IhiXi)^ \e*g 3 {A 3 ) 



Epi 



Epn 



rt^E^Mx^^lle*^ 

* 3 

,n^p''^(^)i^o (vi) ]ii r *(^) 



Er 



rjEp/i 



Jhi(Xi 



■p(Vi) 



Er 



vUhiiX^UgjiAj) 



This proves the induction hypothesis, that is, Epi\rj\ =Epn[rf\ whenever rj 
depends on {V £ A\T(V) <T k }. □ 

Theorem 4. There exists at most one probability measure on Ft that 
simultaneously satisfies (3.4), (3.5), (3.8) and (3.9). 

PROOF. Recall definition (4.16). (3.8) and (3.9) imply that 



(A.l) E Pb 



h(s, x)N(ds, dx) 



J Jo 



E Pg 



h(s,x)v d (ds,dx) 



J Jo 



Now [13], Theorem 3.4, implies that there exists at most one probability 
measure on Tt that coincides with Pg on and satisfies (A.l). □ 

Dual predictable projections. 

Lemma A. 2. Let U denote the dual predictable projection of N with 
respect to Q onto the filtration Tt- 

(1) If h is a bounded and & v measurable processes, then 



h(s, x)U(ds, dx) 



J v JO 



defines an J~Y -predictable process of finite variation. 
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(2) If h and h! are bounded and & <S> J measurable processes, then 



(A.2) 



(A.3) 



j v Jo 



j v Jo 



h(s, x)U (ds, dx), 



h(s, x)U(ds, dx), 



J v , Jo 



h'(s, x)U (ds, dx) 
h'(s,x)N(ds,dx) 



0. 







Q-a.s. whenever V ^ V . 

(3) There exists a nonnegative and 2? <S> J -measurable process A such that 



E P 



J Jo 



h(s, x)N(ds, dx) 



Ep 



J Jo 



h(s, x)X(s, x)U(ds, dx) 



for every bounded and 2? ® J -measurable process h. 



(A.4) 



Proof. The integral equation 

T 

V i 



h(s,x)N v (ds,dx) 



J Jo 



Jv JO 



h(s, x)N(ds, dx) 



defines a multivariate point process iV^ with mark space J which only jumps 
at marks in Jy. [13], Theorem 2.1, provides a dual predictable projection 
U v of N v with respect to the reference measure Q onto the filtration J~Y ■ 
Let h be a bounded and 3 s ® J measurable process. [14], Theorem I 2.2.ii 
and a monotone class argument provides a bounded and ^^-measurable 
process h v such that 



h(;-) = E Q [h(;-)\T^}, Q-a.s. 



Now, 



E, 



Q 



J v Jo 



h(s, x)U(ds, dx) 



Eq 

Eq 
Eq 

En 



J v Jo 



h(s, x)N(ds, dx) 



h(s,x)N v (ds,dx) 



J Jo 



J Jo 



J Jo 



h(s,x)N v (ds,dx) 



h(s, x)U v (ds, dx) 



h(s, x)U v (ds, dx) 



J Jo 



which proves the first claim. 
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To prove (A. 2), let W C Jy and W' C Jy be measurable subsets and 
consider the corresponding counting processes 

:=N([0,t],W) and N^' := N([0,t],W) 

and let 

L/^:=l7([0,t],W) and Xjf := U([0,t], W). 
Following [13], Proposition 2.3, we see that 

MJY = E q [&nY\Fs-\ and AC/f = E Q [AN^' pV], Q-a.s. 
Now, 

o<£ [[tf w ,tf w 'y = i?<; 



-s<T 

< E Q [Mjf MJf'] by Fatou's lemma 

s<T 

= Y,Eq[E q [^NY\T s -]E Q [^nY'\T s -]] 

s<T 

= ^E Q [AN™ANW'} 



s<T 

o, 



so [C/^,^'] = 0, Q-a.s. 

Whenever / and /' are bounded and JVpredictable processes, we have 



(A.5) 



f s dU : 



w 

S ! 



f' s dU t 



w> 



f s f' s d[U w ,U 



W ttW'-i 



0, 



Q-a.s. 



Equation (A. 2) is therefore satisfied in the special case with h = f ■ xw 
and b! = f ■ Xw- The general case now follows from an application of the 
Monotone class theorem. Equation (A. 3) follows from an almost similar 
argument. 

For the last claim, let v denote the dual predictable projection of N with 
respect to P onto the filtration J-t and note that v <C U since P <C Q. The 
existence of A then follows directly from [13], Theorem 4.1. □ 
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